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How Rings assign address space 

Stepl: aligiincotring address to self (to some power of 2) 
Step2: asagpthe resdt to self address 

Step3: next^add- - self_addr + self_addr_space; // mxnber of re^ster usedlocally 
Step4: send dawnnfixt_addr 




Enirreiate rress, 
Addr=8 



Addr=512 



if 2- 



3 
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clkl 



If the delay between "dkl" and "clk2" 
greater then the delay from Q to d of 
second flipflop, we have a race on our 
meaning right hand flipflop will 
sample the data of Q a whole clock period 
early. 



clk2 




compound A 



no 



compound B 



clock runs with data 

the problem is possible race. 
However, we control the logic on 
each flipflop leaving the compound, 
because it is always the same standart 
ring-interface module, we can ensure, 
that the delay will be at least enough. 
And more importantly easily checked 
after layout. 
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clock opposes data 

this arrangement has the advantage 
of auto ensuring the no race 
condition (at least in this simple 



data_b case ) exists 



compound 




data„a which changes after clkb, 
which is later then clka, is sampled 
by clka. NO RACE. 



^ 90 



*9 
7 



clka 



Qal 



/ 



7? 




OK \ 

(jn n 

& compound B 



compound A 
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data 



clock 



clock 




clock 



clka B I I 1 I 

c.«n -|L_rn n_ 

data_a /I \ 

Hncertainty range 



data_a leaving the bridge goes to member "b" and 
there should be sampled by rising of elkb. elkb 
lags a lot behind clka of the bridge. As clearly seen 
from the waveforms, race is eminent. Here we 
should add latches for all the data lines (-90). 
Adding latch works however if the delay between 
clka and elkb is less then 75% of cycle time, 
otherwise the uncertainty kills the usable time. It 
sets hard limit on the number of ring members. 
Also keep in mind that latches needed on each OK 
signal between members of the ring 
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data 




clock 




ncertainty range 



Here, data_b leaves member "b** to be sampled by 
clka in the bridge. But now clkb lags a lot behind 
clka. This actually works to our advantage, If the 
lag is smaller then better part of clock cycle. This 
solution looks better, because between adjacent 
members, we can take care to delay the datas 
beyond danger zone of clock delay, the OK signals 
are covered automatically, and last leg data is also 
covered. The only signal not safe is the OK from 
bridge to "b" member. Tt will need a latch in "b". 



big module 



F'9 



locaLclock *~ fi ° 

local_data_out 



data from 
previous member 

elk 




clock 



ring interface 



local clock lags behind 
ring^interface clock of this 
module, because we presume the 
module is big. for data_coming 
out, it is not a problem, it changes 
later then ring-i/f flipflops clock. 
However for data entering the 
module from previous member, 
the race is a possibility we must 
look into. 
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if module "a" sends a message to module "b", ring works 
fine. However if most of the traffic is from "c" to "b'\ 
this is more expensive in terms of latency. 



Another problem is "peak latency". Suppose that , "a" 
transmits mostly to "d" and "b" mostly to *V* In this case 
communication between "b" and "c" suffers degradation 
in case that peak traffic coincide. 




,3* 




it* 
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Land bridge gets its name from the fact that it is 
a luxury. It spans across connected modules. 
The idea is simple. When V2 sends message to 
Dl it gets to one side of the bridge. This side 
analyzes the destination address and by some 
magic (explained later) decides to short-cut the 
path. The message re-appears at the other end of 
the bridge and gets fast to Dl . By same magic, 
message from VI to D2 get bypassed also, 
message fromVl to Dl is treated directly. 



(£o 3 near ^ 




Enumeration is started by "Anchor" 
which assigns address= I to itself, results 
of enumeration are labels 1 to 7. land 
bridge gets two addresses , as if it were not 
one module, there is "near" end, that got 
enumeration label "3", and the "far" end 
marked 6. 
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msgl 





insg2 
land bridge 




msgl and msg2 arrive at the same time, 
the bridge end must make a decision 
which message to forward first. 

It can be shown that unwise decision can 
lead to freezout, deadlock and option price 
dropping to 5$. 

Therefore MSG2 gets the priority. 
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Bridge lakes responcibility for strays, 
but only at the "far" end. During 
enumeration, bridge is "polarized" to 
have near and far end. Near is the end 
first struck by enumeration message. 



So we have exactly one enforcer for each 
ring. 



3 near 




In land bridge ring, the situation is trickier. If V2 
send message to address=5. The land bridge 
divert at 1 1/far end. it will re-appear at 3 and 
start cycling forever. 

We have to define an algorithm that will take 
care of all cases. 

Luckily there is a way. 

Land Bridge deals only with messages arriving 
at the far end and being diverted. It marks and 
monitors only those. Messages arriving at near 
end, keep their markings. Messages at fdar end 
going through, are left alone. 



F '3 



11 
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9 type/*' 




berA 






6 1 




I 


u 
S 


6/l data^^tf 






ok 












scan test 






reset 






clk^?) 











elk 



type 



ok 



_J 




idle / msgA 


, X msgB j ] \ idl< 




^ ! \ i / ! 



during the first clock, OK remains active, when type 
is of msgA. It means that on the next clock, 
memberA may send new message, member A uses 
this ok to send msgB on the next clock. msgB gets 
stuck for a clock because OK goes inactive. It goes 
inactive because the fifo in memberB is full. One . 
clock later, the fifo has a free entry, so OK returns to 
1 and type returns to idle next clock, return to idle 
could also be change to next message, if there was 
one. 



imsg iok 



omsg ook 




umsg 



uok 
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The incoming messages are examined first 
to see if it is supervisor or work/program. 
Work/program messages have address field. 
We check if it is our address. Since we know 
that our address is aligned to our power of 2, 
The address mask (named split mask) 
causes only certain number if upper bits to 
be compared. The lower part of the address 
is passed inside as internal address. The 
upper bits are compared against self-address 
register. This register gets its value during r 
enumeration protocol. The lower part of this C comparator^ 
register is always masked,. Hopefully ^ 1 

synthesis will delete the unused bits 1 

implementation. ours/through part of the address 



incoming 
address 

address 
split mask 




that enters the member 
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Member 



Rif_I_* Rif_* Rif_o_* module id 



RIF 



Address Space = 7 



Activation register 

S 



30O 
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Member 



Rif_o_type[7:0] 

Rif_o_addr[19:0] 

Rif_o_datal/h[3I;0] 




300 
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Member 



m j* Rif_* Rif_o_* modu!e Jd 



RIF 



Address Space - 7 



Activation register 



Fic?. 33 



3oo 
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the second land bridge solves most traffic problems, but 
adds 4 clocks in the overall ring length. This is not a big 
problem because no message should travel the whole 
perimiter. 



3S + 



3<r 




The Utopia interface is 
forced into mode that 
communicates in 
messages, not cells. We 
using the I/O and maybe 
some of the logic. 



3(o 



Application 

Specific 
Accelerators 

CRC 
Encryption 
Table Lookup 
Hashing 

3** 



Interna] Memory 
Fast, Unified, Multi-port 



~0 ^ 

Network Processor 



Peripheral Expansion 

Enet, ATM, Uart, USB, Serials 



System 
Expansion 
Area 

CPU (PP) 

DMA 
Smart FIFO 
fcxt. mem I/F 

36£ 



5So 
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Internal Memory 

an 



Fetch Unit 

FTU 



Load'ISttin; 

LSU 



Program Sequence* 

PSU 



Vobla Core 



3^4 



PBU 



Arithmetic 

DALU 



Register Fik 

RFU 



Coct Debug 



AGl l-3?2. 



* Agent litis 



p.; 



31 



Doorbell 



DMA Agent 



l .xtcrna) Worid IT 



Vobla Compound* 
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3f 



rtream data out 



32 register* of 
32 bus pet- ta*k 

A set of indkatitm* 
per t*5Jc which 
control tasdc caution 

Aft interface to 
adjacdii rescue* 

Fa*t memory accessed 
by load/store 
t attract ions 




pM-'t task and 
global registers 



tmttatlzftl by 
tbePP 



Big area accessed 
via a DMA uiiUTtaec 
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£M ref Lsrer: 



Ft 5 • / 



3 - sticky btt 

e<| - equal 1 zero 

It - fcis th L ivViL>gfiTivc 

jjt - grealttf then/positive 

e- carry 

nib - reflection of the RAM multi-reader busy indication. 



111! 



tf.30 



RW ETOI WR 



TRAP ^I* 



MlNI>fcX tPR 



RLFSTCH 













MASK. 




xur> lj£:| ctio 


TASK SPR 


UEIX 




WE T liPIll 


R 




mwim 






RMJ 










mmm 





i2ia*aT*5<* i 2 



iiili 



5 


■1 


J 




t 




V 


1 


it 


P 


r 




B 


II 


A 


A 


H 


r 






D 


D 


A 












P 





132^2222322211111 I I II |»»7h54l2» 0 
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Frame structure 
of an example 
task type 



til. 



¥3 



common 
ta.sk Jala 


task 

fragment I 
data 






task fragment 
2 data 


task 

fragment 3 
data 


data of all lcve!2 functions 


level I fl 
data 


Jcvell 12 
data 


level! B 
data 







; stzeoflcvelO 

: frame part is 

I different for 

• each task 

i type 

A size of level 2 
! frame part is 
^ constant 

size of level I 
frame part is 
different per 
each task 
type 
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- Flip Flop 



3> - Logic 
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typc_ in 
? intcrface_atWfC3s( J 5 .0 
Pjnicrfacc_datal3 1 :0] 



type out[7:0]<«-«uCRC ) 
address_out[23:0] 



data_out[63:0] 
(alsfUoCRC) 

P»l'lk 
5 okZdnvc 



reset 




agent 



options(5:01 




RA 



AID[5.0] 



RB/imm8 



RA 1 




source addrcss[23:0] ||dretinatioi*ddrc,'vs[23:03 

increment first snoop last source addr in sram address of destenation number of bytes 
SSISJ-COPIWOPI) <•*» - - - - - to transfer 

(op3> 



Ff 'd 
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Vobla 



agent Interface 



main 
entry 



shadow 
entry 



output 
message 
encoder 



type out[7:0] 
uddrgss_ out[23:0] 
1daTa^uT[63:0] 



^ reset 



elk 



ok2 drive 



agent command 



(AID=mesaage_a«nder) 
(option[6J =0) ™ 



opcode 


options[9-0J 


RA 


AID[4:0) 


RD/imni8 




ftp 



|{U,optionsf5:0]] rawjJatap 1 :0] | raw_addrcss[23:0] 




message data 



address_of_dcstrnarion message type 



agent command 
(AID-m*»*age_»enTlBTT 
(option^] -1) 



opcode 



options[9:0) 
{ t t optionst5:0^ rowdata^OJ 



AID[4:0] 



RB 



raw_address[23:0] 



10000000 



message data 



addressofdestcnation message type 



5' 
6Z 



rjniset mask 



din set mask 



doorbell' 



dnrset mask 



dirT'sct mask 



Vobla 



gent interfc ce 
stall vobla 



token 
control 



c_id[5:0] 



request 
entries 
X2 



if 



DMA 
context table 



input 
message 
decoder 



output 

message 
encoder 



_type_in[7:0] 
wru? interface addrcss(2 3 :0] 

\vTUp_intcrface_datB[3 1 :0] 
ngbase_addrcss[7 :0j 

type out[7:0) 

addrSs_ out[23;Oj 

"lmK!uT[63:0] 



"A 



*ok2 drive 
reset 
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agent command 
(AlO-dma agent; 



dma request M 
entry 1 — i 

mtidify urgent dir autosend long 
address (OP2) (OP I) set ack addfc 



(OP2) (OP I) set ack address 
<OPO) <OP3)(OPIO> 



dram address 



sram address 



number of bytes 
to transfer or 
address modifycr 



Ja5t_in transte*. 



calculates rc 



multireade 



I4st in fram » 



Vobla 



rr )ultireadjataf 3 1 :0] 

eader 
igent i njerf ape 
stall vobla 



register 




input 


file 




buffer 


* 





tx data mux 



, on dmand 



CRC data 



random 
number 
gencrato 
(TBD) 



crc — 

[3,10,32 
machine 



checksum 
machine 



bip16 
machine 



elk 



reset 



Ft $ • 
fTfT 



Fi«. « 



agent command 
(AID^CRC agent) * 



| opcode 


options[9;0] 


RA 


AID[4:0] 1 


1 RB | 





size[2:0; 


G 


S 



CRC type data size 



O DATA[63:32] DATA[3I:0] 



check 



ion overwrite 
residue 



RESIDUE 
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Vobla 



igent interfa ce 



control 
register 



counter 




pre scale* + J IV \ 
by z 



time stamp 
register 



elk 



reset 



p74 



agent command 


opcode 


cvptions[9:0] 


RA 


AID 


RB/imm8 


(AID^tinw agent) 


l l 




ft 



"A 
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Y 33rt 



mask control 




priority logic 


logic 







^ rif i res et 
S pU„ Stock 



7. 



agent comma 
(AID»doorbell 



mdj op 
II agSRTT 



opcode 



options[9:0] 




RA 



AID14:0] 



^71 I glonal tas 



I global task 
[register mask 



T 



set'e lear clear clear 
global request mask * 
(OP2> (OPH COPO) T 



{ 0 AO AO,mask_bit_mdcx [2:0] } write mask 
or 

{ 0,0,0,0, 1 ,w«L>il Jndcx[2:0] } write request 
or 

{ IA0,0.0,count_va!uc[2:0]l write counter 
or 

{0,1 .0,0,0,0.0,0 } write TGM R 
or 

{1,1 ,0 ,0,0,0 ,0.urgent_value } write urgent 



Fif 

yo 
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Memory 



Trajan chip 



Memory 
port 



^ — 



CS 



CS 



Data 



Addr 



. FIFO RD 



F7FOWR 



.4* 



WR FIFO 



RD FIFO 



FPGA 



* ► 



Encryption 



« ► 



-► PCI 



Host # I (PP) 
Host «2 (DMA) 



Ring 
interface 



Message 



Wnte request 
generator 



Read request 
generator 



WR FIFO used 
words counter 



RD FIFO used 
words counter 



Host #3 (Ring extender) 



Memory controller 



Trajan 



FTFO_RD 



Ft J- 
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Trajan memory 
interface 



Traian FIFO_RD FIFO_R p 
input 



Trajan F1FOJWR FIFO_W t_ 



DPR 



Host address 
generator 



31 



Target address 
generator 



Synchronizer 



Synchroniser 



Interface to 
external 
device 
(device 
specific) 



interrupt 



Synchronizer 



FPGA 



(*1 
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r ing in 



setup registers: 



doorbell, taskid and viscode 
urgent viscode 
header len and address 
threshold to urgent 



doorbells 




ring c 



r 


free 


crc 


ovrn 


err 


last 


size 




V 


rx status word 










J 
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r ing in 



enet tk mac 




f,fo fKmaiwtaei- 



ram 



setu 3 



setup registers: 



doorbell, taskid and viscode 
urgent viscode 
send ahead address 
threshold to transmit 
threshold to urgent 



doorbell 
free entry count 
inished frames coui 



( rin 9 



ntrol 




ring c 



(A 
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Internal Memory 



_Prapram 



Instruction 



DaiaT 
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rekyd/Bump ^ Bump teg&feif .Sag 



k 4 ' 



src 2 
dest 



Vobla Core 



~ Agent Bus 



I ipSJ ispsasa r?«wgi ess: 
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External Wwld 




Instruction j 
retch request | 


Instruction \ 
decode I 
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Address 
calculation. \ 
Data access req.1 


Read source . 
Registers ! 
Data execution] 


Write result 
into destination 
register 



c% n »! am^a em aril ii tpm 



-WRITK" 
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• Flip Flop 
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-snap 
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AAU>, AAL ' Z. AAL1, JUU.0 
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Protocol specific 

data path 
functional blocks 



r 




Generic data path 

processing 
functional blocks 



wfigrgi ugHF 



Generic system 

service 
functional blocks 




Fault ami *xcoptton report 



Dofcofl support 



Vobla maintenance 
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